TexAFon 2.0: A text processing tool for the generation of expressive speech in TTS applications
نویسندگان
چکیده
This paper presents TexAfon 2.0, an improved version of the text processing tool TexAFon, specially oriented to the generation of synthetic speech with expressive content. TexAFon is a text processing module in Catalan and Spanish for TTS systems, which performs all the typical tasks needed for the generation of synthetic speech from text: sentence detection, pre-processing, phonetic transcription, syllabication, prosodic segmentation and stress prediction. These improvements include a new normalisation module for the standardisation on chat text in Spanish, a module for the detection of the expressed emotions in the input text, and a module for the automatic detection of the intended speech acts, which are briefly described in the paper. The results of the evaluations carried out for each module are also presented.
منابع مشابه
Audio Visual Speech Synthesis and Speech Recognition for Hindi Language
Every person in the world want to share their information, thoughts from one person to another. So communication plays very important role into that. Speech is the primary means of communication. Hindi is very popular and well known language of India. Everybody understands and speak and write easily. Our System developed for Hindi Text to Speech and Speech to Text Conversion mainly into the Hin...
متن کاملSemantics and Discourse Processing for Expressive TTS
In this paper we present ongoing work to produce an expressive TTS reader that can be used both in text and dialogue applications. The system has been previously used to read (English) poetry and it has now been extended to apply to short stories. The text is fully analyzed both at phonetic and phonological level, and at syntactic and semantic level. The core of the system is the Prosodic Manag...
متن کاملA comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis
Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems...
متن کاملComparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
Chironomic stylization is the process of real-time modification of intonation contours (f0 and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French bas...
متن کاملRefocussing on the Text No in Text-to-speech
Many Natural Language Processing applications depend crucially on the front end processes that handle the input text and transform it into a form usable by the more “sophisticated” linguistic component of the applications. Despite this crucial role, often these front end processes are considered uninteresting, yet it is not unusual for the perception of the complete application to be affected b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014